The libraries covered beofre (BeautifulSoup, lxml, Scrapy) provide user friendly interface for getting data from the web using the HTML source of a page. Yet, sometimes the HTML source is not directly available: it can be created as an output of a function (usually generated by JavaScript) which is toggled by a user input. For example, the data on a page can be generated by ckicling a button on a page or filling in a form or choosing a vlue from a filter. A simple request to a URL will not provide the data, as no user interaction has taken place. In this case, one should write a Python code that will act as a webbrowser. There are many libraries that provide this functionality, but we will concetrate on one of the most popular among them called Selenium. First of all, you need to ahve it installed by running the following command in the command prompt:
pip install selenium
Once selenium is installed you need to download the webdriver of your browser to your local directory. For example, if your notebook is inside the Data_Scraping folder, and your are using the Chrome/Firefox webbrowser, then you may download the drivers from here:
Alright, you are now ready to move to the code. Let's write an algorithm that will open the Chrome browser, go to the www.inventwithpython.com, find a hypterlink titled "Read It Online" (find it using the text directly) and click on it.
In [2]:
from selenium import webdriver
# change Chrome() below with Firefox(), if the latter is the driver you decided to use
browser = webdriver.Chrome()
url = 'http://inventwithpython.com'
browser.get(url)
our_element = browser.find_element_by_link_text('Read It Online')
type(our_element)
our_element.click() # follows the "Read It Online" link
In [3]:
browser.close()
Let's do a similar task for yahoo. These are the steps to take:
In [4]:
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://mail.yahoo.com')
email_element = browser.find_element_by_id('login-username')
email_element.send_keys('hrantdavtyan@yahoo.com')
next_button_element = browser.find_element_by_id('login-signin')
next_button_element.click()
password_element = browser.find_element_by_id('login-passwd')
password_element.send_keys('my_password')
password_element.submit()
In [ ]:
browser.close()
Attribute or method |
Description |
---|---|
|
The tag name, such as |
|
The value for the element’s |
|
The text within the element, such as |
|
For text field or text area elements, clears the text typed into it |
|
Returns |
|
For input elements, returns |
|
For checkbox or radio button elements, returns |
|
A dictionary with keys |
Attributes |
Meanings |
---|---|
|
The keyboard arrow keys |
|
The ENTER and RETURN keys |
|
The |
|
The ESC, BACKSPACE, and DELETE keys |
|
The F1 to F12 keys at the top of the keyboard |
|
The TAB key |
Method name | Description |
---|---|
browser.back() | Clicks the Back button. |
browser.forward() | Clicks the Forward button. |
browser.refresh() | Clicks the Refresh/Reload button. |
browser.quit() | Clicks the Close Window button. |